Non-parametric Message Important Measure: Storage Code Design and Transmission Planning for Big Data
نویسندگان
چکیده
Storage and transmission in big data are discussed in this paper, where message importance is taken into account. Similar to Shannon Entropy and Renyi Entropy, we define non-parametric message important measure (NMIM) as a measure for the message importance in the scenario of big data, which can characterize the uncertainty of random events. It is proved that the proposed NMIM can sufficiently describe two key characters of big data: rare events finding and large diversities of events. Based on NMIM, we first propose an effective compressed encoding mode for data storage, and then discuss the channel transmission over some typical channel models. Numerical simulation results show that using our proposed strategy occupies less storage space without losing too much message importance, and there are growth region and saturation region for the maximum transmission, which contributes to designing of better practical communication system. Index Terms Non-parametric, Message important measure, Big Data, Compressed Storage, NMIM loss distortion, Channel Transmission. Shanyun Liu, Rui She and Pingyi Fan are with Tsinghua National Laboratory for Information Science and Technology(TNList) and the Department of Electronic Engineering, Tsinghua University, Beijing, P.R. China, 100084. e-mail: {liushany16, [email protected], [email protected].} Khaled B. Letaief is with the Department of Electrical and Computer Engineering, HKUST, Hong Kong. email: {[email protected]} and Hamad bin Khalifa University, Qatar. email: {[email protected].} This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61771283, and in part by the China Major State Basic Research Development Program (973 Program) under Grant 2012CB316100(2).
منابع مشابه
Design and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملA Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملFacilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-Level Transparent Local Erasure Coding
This paper presents a simple yet effective design solution to facilitate technology scaling for hard disk drives (HDDs) being deployed in data centers. Emerging magnetic recording technologies improve storage areal density mainly through reducing the track pitch, which however makes HDDs subject to higher read retry rates. More frequent HDD read retries could cause intolerable tail latency for ...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کاملA novel approach in robust group decision making for supply strategic planning
Long-term planning is a challenging process for dealing with problems in big industries. Quick and flexible process of responding to the existing variable requirements are considered in such problems. Some of important strategic decisions which should be made in this field are, namely the way that manufacturing facilities should be applied as well as assignment and design the system of delivery...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.10280 شماره
صفحات -
تاریخ انتشار 2017